library(tidyverse)
library(palmerpenguins)
data(penguins)
#install ggridges with install.packages("ggridges")
library(ggridges)ggridges::geom_density_ridges
Function of the Week
1 ggridges::geom_density_ridges()
In this document, I will introduce the geom_density_ridges() function and show what it’s for.
1.1 What is it for?
Imagine you have some sort of continuous data that you can turn into a density plot. This could be blood pressure, height, or weight. Most of these variables are normally distributed when you have a big enough population.
The function geom_density_ridges() allows you to overlay and directly compare density plots for multiple subcategories. This could be gender, species, city, month, or year.
Such a direct comparison allows us to see where distributions overlap between subcategories and where they are distinct. If time is used, they can also show how distributions of continuous variables change or remain the same over time.
1.1.1 Geom_density_ridges() Uses GGplot Structure
ggplot(dataset ) + aes() + geom_density_ridges()
plot_basic <- ggplot(penguins) +
aes(x = bill_length_mm,
y = species) +
geom_density_ridges()
plot_basicPicking joint bandwidth of 1.08
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density_ridges()`).
If you want a line along the bottom you can add that by using geom_density_ridges2()
plot_basic_2 <- ggplot(penguins) +
aes(x = bill_length_mm,
y = species) +
geom_density_ridges2()
plot_basic_2Picking joint bandwidth of 1.08
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density_ridges()`).
This divides the bill length into separate species of penguins, but we have other subgroups that we are curious about seeing. We can use arguments color and fill to incorporate those subgroups.
penguins %>%
drop_na(sex) %>% #there were many na values in the sex category
ggplot() +
aes(x = bill_length_mm,
y = species,
color = island,
fill = sex) +
geom_density_ridges()Picking joint bandwidth of 0.831
This can get a little cluttered, matching the y axis to a color can help simplify it:
myplot <- ggplot(penguins) +
aes(x = bill_length_mm,
y = species,
color = island,
fill = species) +
scale_color_manual(values = c("Biscoe" = "blue", "Dream" = "green", "Torgersen" = "red")) + #I had to manually match the colors because they were mismatched. scale_color_manual() allows that.
geom_density_ridges()
myplotPicking joint bandwidth of 1.11
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density_ridges()`).
Other arguments:
alpha() changes the transparency of the fill color.
jittered_points() allows us to see the individual points that form the distribution curve.
myplot <- ggplot(penguins) +
aes(x = bill_length_mm,
y = species,
color = island,
fill = species) +
scale_color_manual(values = c("Biscoe" = "blue", "Dream" = "green", "Torgersen" = "red")) +
geom_density_ridges(alpha = 0.5, jittered_points = TRUE)
myplotPicking joint bandwidth of 1.11
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density_ridges()`).
Finally, if the ridges are overlapping too much, you can change that with the scale() argument. A scale of 1 means the top of one graph will reach the bottom of the next one.
myplot <- ggplot(penguins) +
aes(x = bill_length_mm,
y = species,
color = island,
fill = species) +
scale_color_manual(values = c("Biscoe" = "blue", "Dream" = "green", "Torgersen" = "red")) +
geom_density_ridges(alpha = 0.5, jittered_points = TRUE, scale = 0.95)
myplotPicking joint bandwidth of 1.11
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density_ridges()`).
Sometimes, having more overlap with transparency on can show areas where the distributions meet.
myplot <- ggplot(penguins) +
aes(x = bill_length_mm,
y = species,
color = island,
fill = species) +
scale_color_manual(values = c("Biscoe" = "blue", "Dream" = "green", "Torgersen" = "red")) +
geom_density_ridges(alpha = 0.5, jittered_points = TRUE, scale = 5)
myplotPicking joint bandwidth of 1.11
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density_ridges()`).
1.2 Is it helpful?
The ggridges package with the density ridges function would be helpful in observational studies that collect a large amount of data on a number of different variables with subgroups.
Additionally, an interesting graph that I saw using this function used it to compare data over time intervals, where the y axis included each month of a year and the x axis showed how temperatures varied for each of those months.
I think it these graphs can be quite informative and helpful.